The contribution of alternative splicing probability to the coding expansion of the genome

نویسندگان

  • Fernando Carrillo Oesterreich
  • Hugo Bowne-Anderson
  • Jonathon Howard
چکیده

Alternative splicing results in the inclusion or exclusion of exons in an RNA, thereby allowing a single gene to code for multiple RNA isoforms. Genes are often composed of many exons, allowing combinatorial choice to significantly expand the coding potential of the genome. How much coding potential is gained by alternative splicing and what is the main contributor: alternative-splicing-depth or exon-count? Here we develop a splice-site-centric quantification method, allowing us to characterize transcriptome-wide alternative splicing with a simple probabilistic model, enabling species-wide comparison. We use information theory to quantify the coding potential gain and show that an increase in alternative splicing probability contributes more to transcriptome expansion than exon-count. Our results suggest that dominant isoforms are co-expressed alongside many minor isoforms. We propose that this solves two problems simultaneously, that is, expression of functional isoforms and expansion of the transcriptome landscape potentially without a direct function, but available for evolution. Glossary Transcriptome: Set of all RNA molecules in a sample (e.g. cell, tissue, organism). Transcriptome expansion: Increase of coding expansion of the genome. Gene annotation: Meta information added to the raw DNA sequence, such as exon-intron structure. Gene architecture: Exon-intron structure of genes. RNA Splicing: RNA maturation event leading to removal of introns and joining of exons. Splice site: Exon-intron (5' splice site) or intron-exon boundary (3' splice site). Constitutive splicing: The process that results in the joining of two splice-sites in all observed situation. Alternative splicing: The process that results that one splice site can be joined to distinct partner splice sites. RNA-seq experiment: Qualitative and quantitative profile of transcriptome by deep sequencing. Extent: A parameter used to characterize the amount of alternative splicing in any given transcriptome; technically, the extent í µí»½ = ! ! , where í µí»¼ is the exponent in the power law distribution that describes the amount of alternative splicing in the transcriptome. Splice site expression: Number of RNA-Seq observations per splice site. Shannon Entropy: Metric of the expected information content. True Diversity: An ecological concept which measures both the number of distinct species (richness) and how uniformly they are distributed in a sample (evenness). Machine Learning: Computational algorithms which learn rules (model) to predict an output from an input. Random Forest: A non-linear machine learning model based on an ensemble of decision trees with random feature subset selection at each decision node. Lasso Regression: Linear regression regularized by absolute value of the sum …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Role of Aberrant Alternative Splicing in Cancer

Alternative splicing can alter genome sequence and as a consequence, many genes change to oncogenes. This event can also affect protein function and diversity. The growing number of study elucidate the pathological influence of impaired alternative splicing events on numerous disease including cancer. Here, we would like to highlight the significant role of alternative splicing in cancer biolog...

متن کامل

P87: The Role of the Long Non-Coding RNA Sequences (LncRNAs) in Neurological Disorders

Precise interpretation of the transcriptome sequences in the several species showed that the major part of genome has been transcribed; however, just a few amounts of the transcription sequences have open-reading frames which are conversed during the evolution. So, it is unlikely that many of the transcribed sequences code the proteins. Among the all human non-coding transcripts, at least 10000...

متن کامل

Colony Forming Unit Endothelial Cells Do not Exhibit Telomerase Alternative Splicing Variants and Activity

Introduction: Endothelial progenitor colony forming unit-endothelial cells (CFU-EC) were first believed to be the progenitors of endothelial cells, named endothelial progenitor cells. Further studies revealed that they are monocytes regulating vasculogenesis. The main hindrance of these cells for therapeutic purposes is their low frequency and limited replicative potentials. This study was unde...

متن کامل

Long non-coding RNAs and their significance in human diseases

Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...

متن کامل

The Role of Long Non Coding RNAs in the Repair of DNA Double Strand Breaks

DNA double strand breaks (DSBs) are abrasions caused in both strands of the DNA duplex following exposure to both exogenous and endogenous conditions. Such abrasions have deleterious effect in cells leading to genome rearrangements and cell death. A number of repair systems including homologous recombination (HR) and non-homologous end-joining (NHEJ) have been evolved to minimize the fatal effe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016